home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 1
/
Cream of the Crop 1.iso
/
PROGRAM
/
TPU60DIS.ARJ
/
TPU6DOC.TXT
< prev
next >
Wrap
Text File
|
1991-04-16
|
129KB
|
2,980 lines
-----------------------------
INSIDE TURBO PASCAL 6.0 UNITS
-----------------------------
by
William L. Peavy
-----------------
Revised: April 16, 1991
ABSTRACT
If you want to know what is in a .TPU (unit) file produced
by Version 6.0 of Turbo Pascal from Borland International,
then this paper is for you. It doesn't explain quite
everything since the I don't have access to secret documents
or anything like that and since some of the data in .TPU
files just doesn't have enough auxiliary information to make
its role clear. However, it is possible to learn a great
deal about how Turbo Pascal organizes the information it
needs to refer to, and it is also possible to learn just
what kind of code the compiler produces.
This is the third in a series of reports on the subject of
Turbo Pascal Units, the first treating with Turbo Pascal
Version 5.0 and the second with Turbo Pascal 5.5. The
evolution of these files in the face of changing
requirements has been fascinating to behold and deciphering
their contents has been challenging to say the least.
The programs supplied with this report have been reorganized
from their 5.5 style in some ways and many identifiers have
been changed. These changes were more for style than for
substance. Other changes were dictated by the changes in
the organization of the TPU file itself and certain errors
in the 5.5 programs have been corrected. In addition, other
errors of interpretation have been fixed which has led to
some enhanced descriptive capability.
Since I have a "real" job which requires my full attention,
and since it doesn't involve use of these products in any
direct way, I am usually hard-pressed to find the personal
time to conduct this research. Consequently, I always
refuse to commit to follow-up or even error correction. It
would be irresponsible of me to pretend it could be
otherwise. Even so, this is a revised report which contains
a few error fixes and discusses the newly enhanced program
which incorporates these fixes and sports some enhanced
capabilities.
Contents
Introduction ................................................. 5
1. Gross File Structure ...................................... 5
1.1 User Units ........................................... 6
2. Locators .................................................. 7
2.1 Local Links .......................................... 7
2.2 Global Links ......................................... 7
2.3 Table Offsets ........................................ 7
2.4 Basic Relationships .................................. 8
3. Unit Header .............................................. 11
3.1 Description ......................................... 11
3.2 UNIT Size ........................................... 14
4. Symbol Dictionaries ...................................... 14
4.1 Organization ........................................ 14
4.2 Interface Dictionary ................................ 14
4.3 Debug Dictionary .................................... 15
4.4 Dictionary Elements ................................. 15
4.4.1 Hash Tables ................................... 15
4.4.1.1 Size .................................... 16
4.4.1.2 Scope ................................... 16
4.4.1.3 Special Cases ........................... 17
4.4.2 Dictionary Headers ............................ 17
4.4.3 Dictionary Stubs .............................. 18
4.4.3.1 Label Declaratives ("O") ................ 18
4.4.3.2 Un-Typed Constants ("P") ................ 18
4.4.3.3 Named Types ("Q") ....................... 18
4.4.3.4 Variables, Fields, Typed Cons ("R") ..... 19
4.4.3.5 Subprograms & Methods ("S") ............. 20
4.4.3.6 Turbo Std Procedures ("T") .............. 21
4.4.3.7 Turbo Std Functions ("U") ............... 21
4.4.3.8 Turbo Std "NEW" Routine ("V") ........... 21
4.4.3.9 Turbo Std Port Arrays ("W") ............. 21
4.4.3.10 Turbo Std External Variables ("X") ..... 21
4.4.3.11 Units ("Y") ............................ 22
4.4.4 Type Descriptors .............................. 22
4.4.4.1 Scope ................................... 23
4.4.4.2 Prefix Part ............................. 23
4.4.4.3 Suffix Parts ............................ 24
4.4.4.3.1 Un-Typed .......................... 25
4.4.4.3.2 Structured Types .................. 25
4.4.4.3.2.1 ARRAY Types ................. 25
4.4.4.3.2.2 RECORD Types ................ 25
4.4.4.3.2.3 OBJECT Types ................ 26
4.4.4.3.2.4 FILE (non-TEXT) Types ....... 27
4.4.4.3.2.5 TEXT File Types ............. 27
4.4.4.3.2.6 SET Types ................... 27
- iii -
Contents
4.4.4.3.2.7 POINTER Types ............... 27
4.4.4.3.2.8 STRING Types ................ 27
4.4.4.3.3 Floating-Point Types .............. 27
4.4.4.3.4 Ordinal Types ..................... 28
4.4.4.3.4.1 "Integers" .................. 28
4.4.4.3.4.2 BOOLEANs .................... 28
4.4.4.3.4.3 CHARs ....................... 28
4.4.4.3.4.4 ENUMERATions ................ 29
4.4.4.3.5 SUBPROGRAM Types .................. 29
5. Maps and Lists ........................................... 30
5.1 PROC Map ............................................ 30
5.2 CSeg Map ............................................ 31
5.3 Typed CONST DSeg Map ................................ 31
5.4 Global VAR DSeg Map ................................. 32
5.5 Donor Unit List ..................................... 32
5.6 Source File List .................................... 33
5.7 DEBUG Trace Table ................................... 34
6. Code, Data, Fix-Up Info .................................. 35
6.1 Object CSegs ........................................ 35
6.2 CONST DSegs ......................................... 35
6.3 Fix-Up Data Table ................................... 36
7. Supplied Program ......................................... 37
7.1 TPU6 ................................................ 37
7.1.1 UNIT TPU6AMS .................................. 37
7.1.2 UNIT TPU6EQU .................................. 38
7.1.3 UNIT TPU6UTL .................................. 38
7.1.4 UNIT TPU6RPT .................................. 38
7.1.5 UNIT TPU6UNA .................................. 38
7.2 Modifications ....................................... 39
7.3 Notes on Program Logic .............................. 39
7.3.1 Formatting the Dictionary ..................... 39
7.3.2 The Disassembler .............................. 41
8. Unit Libraries ........................................... 43
8.1 Library Structure ................................... 43
9. Application Notes ........................................ 44
10. Acknowledgements ........................................ 45
11. References .............................................. 46
INDEX ....................................................... 47
- iv -
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
INTRODUCTION
This document is the outcome of an inquiry conducted into the
structure and content of Borland Turbo Pascal (Version 6.0) Unit
files. The original purpose of the inquiry was to provide a body of
theory enabling Cross-Reference programs to resolve references to
symbols defined in .TPU files where qualification was not explicitly
provided. As is so often the case, one thing led to another and the
scope of the inquiry was expanded dramatically. While this document
should not be regarded as definitive, the author feels that the entire
Turbo Pascal User community might gain from the information extracted
from these files at the cost of so much time and effort.
The material contained herein represents the findings and
interpretations of the author. A great deal of guess-work was
required and no assurances are given as to the accuracy of either the
findings of fact or the inferences contained herein which are the sole
work-product of the author. In particular, the author had access only
to materials or information that any normal Borland customer has
access to. Further, no Borland source-codes were available as the
Library Routine source is not licensed to the author. In short, there
was nothing irregular about how these findings were achieved.
The material contained herein is placed in the public domain free of
copyright for use of the general public at its own risk. The author
assumes no liability for any damages arising from the use of this
material by others. If you make use of this information and you get
burned, TOUGH! The author accepts no obligation to correct any such
errors as may exist in the supplied programs or in the findings of
fact or opinion contained herein. On the other hand, this is not a
"complete" work in that a great many questions remain open, especially
as regards fine details. (The author is not highly-qualified in Intel
80xxx Assembly Language and several open questions might best be
addressed by persons competent in this area.) The author welcomes the
input of interested readers who might be able to "flesh-out" some of
these open questions with "hard" answers.
1. GROSS FILE STRUCTURE
A Turbo Pascal Unit file consists of an array of bytes that is some
exact multiple of sixteen (16). "Signature" information allows the
compiler to verify that the .TPU file was compiled with the correct
compiler version and to verify that the file is of the correct size.
The fine structure of the file will be addressed in later sections at
ever increasing levels of detail.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 5
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
Graphically, the file may be regarded as having the following general
layout:
+-------------------+
| Unit Header | Main Index to Unit File
|-------------------|
| Dictionaries: |
| a) Interface |
| b) Debug * | For Local Symbol Access
|-------------------|
| PROC Map |
|-------------------|
| CSeg Map * | May be Empty
|-------------------|
| CONST DSeg Map * | May be Empty
|-------------------|
| VAR DSeg Map * | May be Empty
|-------------------|
| Donor Units * | May be Empty
|-------------------|
| Source Files |
|-------------------|
| Trace Table * | May be Empty
|-------------------|
| CODE Segment(s) * | May be Empty
|-------------------|
| DATA Segment(s) * | May be Empty
|-------------------|
| FIX-UP Data * | May be Empty
+-------------------+
1.1 USER UNITS
Units prepared by the compiler available to ordinary users have a very
straight-forward appearance and content. There may even be a little
"wasted" space that might be removed if the compiler were just a
little cleverer. The SYSTEM.TPU file is quite another thing however.
The SYSTEM.TPU file (found in TURBO.TPL) is extraordinary in that
great pains seem to have been taken to compact it. Further, it
contains a great many types of entries that just don't seem to be
achievable by ordinary users and I suspect that much (if not all) of
it was "hand-coded" in Assembler Language.
In the following sections, the details of these optimizations will be
explained in the context of the structural element then under
discussion.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 6
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
2. LOCATORS
The data in these files has need of structure and organization to
support efficient access by the various programs such as the compiler,
the linker and the debugger. This organization is built on a solid
foundation of locators employed in the unit's data structures.
2.1 LOCAL LINKS
Local Links (LL's) are items of type WORD (2 bytes) which contain an
offset which is relative to the origin of the unit file itself. This
implies that a unit must be somewhat less than 64K bytes in size. If
the .TPU file is loaded into the heap, then an LL can be used to
locate any byte in the segment beginning with the load point of the
file.
2.2 GLOBAL LINKS
Global Links (LG's) are used to locate type descriptors and to locate
allocation data for variables with the ABSOLUTE attribute which may
reside in other Units (i.e., units external to the present unit).
LG's are structured items consisting of two (2) words. The first of
these is an LL that is relative to the origin of the (possibly)
external unit. It locates either a Type Descriptor or the stub of the
Dictionary entry which establishes storage allocation. The second
word is an LL which locates the stub of the unit entry in the current
unit dictionary for the (possibly) external unit. This dictionary
entry provides the name of the unit that contains the item the LG
points to.
This provides a handy mechanism for locating type descriptors and
allocation information which may be defined in other separately
compiled units.
2.3 TABLE OFFSETS
Finally, various data-structures within a .TPU file are organized as
arrays of fixed-length records or as lists of variable-length records.
Efficient access to such records is achieved by means of offsets
rather than subscripts (an addressing technique denied Pascal). These
offsets are relative to the origin of the array or list being
referenced rather than the origin of the unit.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 7
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
2.4 BASIC RELATIONSHIPS
+-------------+ +----------------------+
| Unit | | INTERFACE Dictionary |
| Header | | |
+-------------+ | Public and Private |
| | Names, Nested Hash |
| LL +----------------+ LL's | Tables, INLINE code, |
|-------->| INTERFACE Hash |------->| Type Descriptors. |
| +----------------+ +----------------------+
| (LL's ^ & LG's)
| +----------------------+
| LL +----------------+ LL's | DEBUG Dictionary |
|-------->| DEBUG Hash |------->| IMPLEMENTATION and |
| +----------------+ | nested scope names, |
| ?| stored for DEBUG. |
| LL +----------------+ | Same structure as in |
|-------->| PROC Map Table | | INTERFACE. Linked |
| +----------------+ | to INTERFACE part by |
| LL +----------------+ | LL's. BUILT ONLY IF |
|-------->| CSeg Map Table |? | LOCAL SYMBOLS ARE |
| +----------------+ | ENABLED AT COMPILE. |
| LL +----------------+ +----------------------+
|-------->| DSeg Map CONST |?
| +----------------+
| LL +----------------+
|-------->| DSeg Map VAR's |?
| +----------------+ IMPORTANT NOTES
| LL +----------------+ ----------------------
|-------->| Donor Unit List|? Some of the structures
| +----------------+ shown in this figure
| LL +------------------+ are built only if they
|-------->| Source File List | are needed. These are
| +------------------+ marked by a "?" next
| LL +------------------+ to the box.
|-------->| Debug Step Ctls |?
| +------------------+ If the DEBUG Dictionary
| ** +---------------+ is missing, its LL
|-------->| CODE Segments |? leads directly to the
| +---------------+ INTERFACE Dictionary.
| ** +-----------------+ ----------------------
|-------->| CONST DATA Segs |?
| +-----------------+
| ** +----------------+
+-------->| Fix-Up Lists |?
+----------------+
This figure illustrates the role of the Unit Header in tying together
the various data structures in the Unit. The type of link is shown
next to a flow-line by "LL", "LG" or "**". "LL" and "LG" are explicit
pointers while "**" shows a locator whose value is computed using
other data in the Unit Header and that no explicit pointer exists.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 8
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
+----(from hash tables,other Dictionary Entries)
|
| +------------------------------------------------+
| | Header Part | Stub Part -- many formats |
+--->| - - - - - - | - - - +------------------------- |
| | data, | Some stubs have embedded | Dictionary
| Name, Class | links | Type Descriptors | Entry
| and link to | (see | +------------------- |
| entries who | below)| | INLINE Declarative |
| have same | * | | code bytes for a |
| hash | | | | "macro" type PROC |
+-----------------|------------------------------+
+----------+
|
| FAR pntr +----------------------------+
|----------->| Absolute Memory Locations |
| +----------------------------+
| +-----------------------------+
| LG's | Type Descriptors and stubs |
|----------->| of Dictionary Entries used |
| | for absolute equivalences |
| +-----------------------------+
| +---------------------------------+
| LL's | Nested Scope Hash Tables |
|----------->| Parent Scope Dictionary Entries |
| | Record Fields |
| | Object Fields/Methods |
| +---------------------------------+
| +----------------------+
| Offsets | CONST DSeg Map Table |
+----------->| PROC Map Table |
| VAR DSeg Map Table |
+----------------------+
This figure illustrates the many types of entities that associate with
Dictionary Entries and particularly with their Stub Parts. Not all of
the links shown occur in a single Stub format, but all of the links in
the figure can and do exist in selected cases. The purpose here is to
show the flexibility of the system of links in associating required
data with the Dictionary Entry and its identifying symbol.
While it may not be apparent from the figure, the dictionary structure
as a whole may be viewed as a cyclic directed graph which is rooted in
the DEBUG Hash Table. The recursive properties exhibited by the node
relationships permit direct support of the scope rules of Turbo Pascal
with simplicity and elegance. As one might expect, the representation
of the required information lends itself to efficient use of storage
since the representations are compact and there is very little in the
way of redundancy. The small amount of redundancy that does exist is
apparently aimed at speeding access to certain structures by the Turbo
components (compiler, linker and debugger).
----------------------------------------------------------------------
Rev: April 16, 1991 Page 9
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
+----(implied links, explicit LG's from other structures)
|
| +---------------------------------------------+
| | Flags and codes, allocation widths for data | Type
+--->| and VMT's, subrange constraints, formal | Descriptor
| parameter descriptors, implicit associated | Contents &
| type descriptors, LL's, LG's and Offsets. | Linkages
+---------------------------------------------+
|
|
| LG's +------------------+
|-------------->| Type Descriptors |
| +------------------+
|
| +-------------------------------+
| LL's | Method Dictionary Entries |
|-------------->| Nested Scope Hash Tables |
| | Nested Scope Field Chains |
| | Parent Scope Dictionary Entry |
| +-------------------------------+
|
| Offsets +----------------------------------+
+-------------->| VMT pointers in Object Instances |
| CONST DSeg Map Table Entries |
+----------------------------------+
This figure illustrates the relationships between Type Descriptors and
other structures in the dictionary. Not all the links shown can exist
with a single Type Descriptor since there are several variant forms of
these descriptors (depending on base type) but in combination, these
linkages are feasible. In addition to links, a great amount of data
is stored which is peculiar to a given type declaration. Descriptors
can be -- and are -- shared. Indeed, they were designed with that in
mind. Once a named type is declared, all entities that reference it
are linked to it in some way (usually by an LG).
Almost every form of type descriptor is found in the SYSTEM unit and
this fact is used to advantage. When un-typed constants are declared,
a built-in type descriptor is referenced (via an LG) which provides
necessary information for maintenance of orderly dictionary structure.
When a named-type is declared, it is almost always decomposed into an
expression based on the built-in types of Turbo Pascal which are found
in the SYSTEM unit with the aid of an LG.
The semantics underlying the idea of the Unit mandate this very
approach since program modules of any class which make references to
units for definitions use the definitions as implemented by the unit
which contains them. Re-defining the unit or any of its defined types
leads to a natural requirement to re-compile those program modules
which rely on the unit for definitions. The impact is fundamental
since the storage representation of a unit-defined named type can
change in quite radical ways.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 10
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
3. UNIT HEADER
The Unit Header comprises the first 64 bytes of the .TPU file. It
contains LL's that effectively locate all other sections of the .TPU
file plus statistics that enable a little cross-checking to be
performed. Some parts of the Unit Header appear to be reserved for
future use since no unit examined by this author has ever contained
non-zero data in these apparently reserved fields.
3.1 DESCRIPTION
The Unit Header provides a high-level locator table whereby each major
structure in the unit file can be addressed. The following provides a
Pascal-like explanation of the layout of the header followed by
further narrative discussion of the contents of the individual fields
in the Unit Header.
Type HdrAry = Array[0..3] of Char; LL = Word;
UnitHeader = Record
UHEYE : HdrAry; { +00 : = 'TPU9' }
UHxxx : HdrAry; { +04 : = $00000000 }
UHUDH : LL; { +08 : to Dictionary Head-This Unit }
UGIHT : LL; { +0A : to Hash Table (INTERFACE) }
UHPMT : LL; { +0C : to PROC Map }
UHCMT : LL; { +0E : to CSeg Map }
UHTMT : LL; { +10 : to DSeg Map-Typed CONST's }
UHDMT : LL; { +12 : to DSeg Map-GLOBAL Variables }
UHxxy : LL; { +14 : Purpose Unknown }
UHLDU : LL; { +16 : to Donor Unit List }
UHLSF : LL; { +18 : to Source file List }
UHDBT : LL; { +1A : to Debug Trace Step Controls }
UHENC : LL; { +1C : to end non-code part of Unit }
UHZCS : Word; { +1E : Size of CSEGs (aggregate) }
UHZDT : Word; { +20 : Size of Typed Constant Data }
UHZFA : Word; { +22 : Fix-Up Bytes (CSegs) }
UHZFT : Word; { +24 : Fix-Up Bytes (Typed CONST's) }
UHZFV : Word; { +26 : Size of GLOBAL VAR Data }
UHDHT : LL; { +28 : to Hash Table (DEBUG) }
UHSOV : Word; { +2A : Overlay Involved if non-zero }
UHPad : Array[0..9]
of Word; { +2C : Reserved for Future Expansion }
End; { UnitHeader }
UHEYE contains the characters "TPU9" in that order. This is
clear evidence that this unit was compiled by Turbo Pascal
Version 6.0.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 11
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
UHxxx is apparently reserved and contains binary zeros.
UHUDH contains an LL (WORD) which points to the Dictionary
Header in which the name of this unit is found.
UHIHT contains an LL (WORD) which points to a Hash table that is
the root of the Interface Dictionary graph.
UHPMT contains an LL (WORD) which points to the PROC Map for
this unit. The PROC Map contains an entry for each
Procedure or Function declared in the unit (except for
INLINE types), plus an entry for the Unit Initialization
section. The length of the PROC Map (in bytes) is
determined by subtracting this UHPMT from UHCMT.
UHCMT contains an LL (WORD) which points to the CSeg (CODE
Segment) Map for this unit. The CSeg Map contains an
entry for each CODE Segment produced by the compiler plus
an entry for each of the CODE Segments included via the
{$L filename.OBJ} compiler directive. The length of this
Map (in bytes) is obtained by subtracting UNCMT from
UHTMT. The result may be zero in which case the CSeg Map
is empty.
UHTMT contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that maps the initializing data for Typed
CONST items plus templates for VMT's (Virtual Method
Tables) that are associated with OBJECTS which employ
Virtual Methods. The length of this Map (in bytes) is
obtained by subtracting UHTMT from UHDMT. The result may
be zero in which case this DSeg Map is empty.
UHDMT contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that contains the specifications for DSeg
storage required by VARiables whose scope is GLOBAL. The
length of this Map (in bytes) is obtained by subtracting
UHDMT from UHxxy. The result may be zero in which case
this DSeg Map is empty.
UHxxy Purpose of this word is unknown. No non-zero values have
ever been observed here. (May be for TP-Windows?)
UHLDU contains an LL (WORD) which points to a table of units
which contribute either CODE or DATA Segments to the .EXE
file for a program using this Unit. This is called the
"Donor Unit Table". The length of this table (in bytes)
is obtained by subtracting UHLDU from the word UHLSF. The
result may be zero in which case this table is empty.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 12
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
UHLSF contains an LL (WORD) which points to a list of "source"
files. These are the files whose CODE or DATA Segments
are included in this Unit by the compiler. Examples are
the Pascal Source for the Unit itself, plus the .OBJ files
included via the {$L filename.OBJ} compiler directive.
The length of this table (in bytes) is obtained by
subtracting UHLSF from the word UHDBT. The result may be
zero in which case this table is empty.
UHDBT contains an LL (WORD) which points to a Trace Table used
by the DEBUGGER for "stepping" through a Function or
Procedure contained in this Unit. The length of this
table (in bytes) is obtained by subtracting UHDBT from the
word UHENC. The result may be zero in which case this
table is empty.
UHENC contains an LL (WORD) which points to the first free byte
which follows the Trace Table (if any). It serves as a
delimiter for determining the size of the Trace Table.
This LL (when rounded up to the next integral multiple of
16) serves to locate the start of the code/data segments.
UHZCS is a WORD that contains the total byte count of all CODE
Segments compiled into this Unit.
UHZDT is a WORD that contains the total byte count of all Typed
CONST and VMT DATA Segments compiled into this unit.
UHZFA is a WORD that contains the total byte count of the Fix-Up
Data Table for this unit for CODE (CSegs).
UHZFT is a WORD that contains the total byte count of the Fix-Up
Data Table for Typed CONST's. This usually implies that a
VMT is getting its pointers relocated.
UHZFV is a WORD that contains the total byte count of all GLOBAL
VAR DATA Segments compiled into this unit.
UHDHT contains an LL (WORD) which points to a Hash Table which
is the root of the DEBUGGER Dictionary. If Local Symbols
were generated by the compiler (directive {$L+}) then ALL
symbols declared in the unit can be accessed from this
Hash Table. If Local Symbols were suppressed there is no
such Dictionary and the LL stored here points to the
INTERFACE Dictionary.
UHSOV Purpose of this word is unknown. It has been observed to
be non-zero when overlay directives are used. So far
however, this hasn't enabled me to come up with a good
guess as to just what the observed values actually mean.
UHPad begins a series of ten (10) words that are apparently
reserved for future use. Nothing but zeros have ever been
seen here by this author.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 13
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
3.2 UNIT SIZE
An independent check on the size of the .TPU file is available using
information contained in the Unit Header. This is also important for
.TPL (Unit Library) organization. To compute the file :size, refer to
the five (5) words -- UHENC, UHZCS, UHZDT, UHZFA, and UHZFT. Round
the contents of each of these words to the lowest multiple of 16 that
is greater than or equal to the content of that word. Then form the
sum of the rounded words. This is the .TPU file size in bytes.
4. SYMBOL DICTIONARIES
This area contains all available documentation of declared symbols and
procedure blocks defined within the unit. Depending on compiler
options in effect when the unit was compiled, this section will
contain at a minimum, the INTERFACE declarations, and at a maximum,
ALL declarations. The information stored in the dictionary is highly
dependent on the context of the symbol declared. We defer further
explanation to the appropriate section which follows.
4.1 ORGANIZATION
A dictionary is organized with a Hash Table as its root. The hash
table is used to provide rapid access to identifiers.
A dictionary may be thought of as a directed graph. Each subgraph is
rooted in a hash table. There may be a great many hash tables in a
given unit and their number depends on unit complexity as well as the
options chosen when the unit was compiled. Use of the {$L+} directive
produces the largest dictionaries. The hash tables are explained in
detail a few sections further on.
Hash tables point to Dictionary Headers. When two or more symbols
produce the same hash function result, a collision is said to occur.
Collisions are resolved by the time-honored method of chaining
together the Dictionary Headers of those symbols having the same hash
function result. Dictionary supersetting is accomplished using these
chains.
4.2 INTERFACE DICTIONARY
The INTERFACE dictionary contains all symbols and the necessary
explanatory data for the INTERFACE section of a Unit. Symbols get
added to the Unit using increasing storage addresses until the
IMPLEMENTATION section is encountered.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 14
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.3 DEBUG DICTIONARY
The Debug dictionary (if present) is a superset of the INTERFACE
dictionary. It is used by the Turbo Debugger to support its many
features when tracing through a unit. If present, this dictionary is
rooted in its own hash table. The hash table is effectively
initialized when the IMPLEMENTATION keyword is processed by the
compiler. This takes the form (initially) of an unmodified copy of
the INTERFACE hash table, to which symbols are added in the usual
fashion. Thus, the hash chains constructed or extended at this time
lead naturally to the INTERFACE chains and this is how the superset is
effectively implemented.
4.4 DICTIONARY ELEMENTS
The dictionary contains four major elements. These are: hash tables,
Dictionary Headers, Dictionary Stubs and Type Descriptors. The
distinction between Dictionary Headers and Stubs might appear to be
rather arbitrary. They might just as easily be regarded as a single
element (such as symbol entry). However, the case for the separate
entity approach is strong since Stubs are DIRECTLY addressed via LG's
and -- more to the point -- ONLY by LG's. Thus, it seems reasonable
that this is a separate and very important structure -- at least in
the minds of the architects at Borland.
4.4.1 HASH TABLES
As has been intimated, Hash Tables are the glue that binds the
dictionary entries together and gives the dictionary its "shape".
They effectively implement the scope rules of the language and speed
access to essential information.
Each Hash table begins with a 2-byte size descriptor. This descriptor
contains the number of bytes in the table proper (less 2). Thus, the
descriptor directly points to the last bucket in the hash table. For
a hash table of 128 bytes, the size descriptor contains 126. The
first bucket in the table immediately follows the size descriptor.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 15
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.1.1 SIZE
So far, three different hash table sizes have been observed. The
INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
special case, containing only 16 entries. Hash tables which anchor
subgraphs whose scope is relatively local usually contain four (4)
entries (8 bytes).
Graphically, a Hash Table with four slots has the following layout:
+--------------------+
| 0006h | Size Descriptor
|--------------------|
| slot 0 | an LL or zero
|--------------------|
| slot 1 | an LL or zero
|--------------------|
| slot 2 | an LL or zero
|--------------------|
| slot 3 | an LL or zero
+--------------------+
It should be noted that the Size Descriptor furnishes an upper bound
for the hash function itself. Thus, it seems possible that a single
hash function is used for all hash tables and that its result is ANDed
with the Size Descriptor to get the final result. Because the sizes
are chosen as they are (powers of 2) this is feasible. Note that in
the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
the hash tables observed so far have this property.
One final note on this subject. Given these properties, "Folding" of
sparse hash tables is a rather trivial exercise so long as the new
hash table also contains a number of slots that is a power of 2. This
point is intriguing when one recalls that the SYSTEM.TPU hash table
has only 16 slots rather than the usual 64.
4.4.1.2 SCOPE
The INTERFACE and Debug dictionary hash tables are Global in Scope
even though the symbols accessed directly via either hash table may be
private. On the other hand, other hash tables are purely local in
scope. For example, the fields declared within a record are reached
via a small local hash table, as are the arguments and local variables
declared within procedures and functions. Even OBJECTS use this
technique to provide access to Methods and Object Fields.
Access to such local scope fields/methods requires use of qualified
names which ensures conformity to Pascal scope rules. The method is
truly simple and elegant.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 16
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.1.3 SPECIAL CASES
The SYSTEM.TPU Unit is a special case. Its INTERFACE hash table has
apparently been "hand-tuned" for small size and it contains only
sixteen (16) entries. In addition, the Debug hash table is absent
since there is no local symbol generation in this unit. Therefore,
the Debug hash table does not exist as a separate entity, its function
being served by the INTERFACE hash table. The pointer to the Debug
hash table (in the Unit Header) has the same value as the pointer to
the INTERFACE hash table.
4.4.2 DICTIONARY HEADERS
This is the structure that anchors all information known by the
compiler about any symbol. The format is as follows:
+00: An LL which points to the next (previous) symbol in the
same unit which had the same hash function value.
+02: A character that defines the category the symbol belongs
to and defines the format of the Dictionary Stub which
follows the Dictionary Header. If the symbol is declared
in the component list of the "private" part of an Object
declaration, then this character is modified by adding $80
to its ordinal value. Thus, an ordinary Function,
Procedure or Method is of category "S" while a private
Method is of category Chr(Ord('S')+$80).
+03: A String (in the Pascal sense) of variable size that
contains the text of the symbol (in UPPER-CASE letters
only). The SizeOf function is not defined for these
strings since they are truncated to match the symbol size.
The "value" of the SizeOf function can be determined by
adding 1 to the first byte in the string. Thus,
Ord(Symbol[0])+1 is the expression that defines the Size
of the symbol string. Turbo Pascal defines a symbol as a
string of relatively arbitrary size, the most significant
63 characters of which will be stored in the dictionary.
Thus, we conclude that the maximum size of such a string
is 64 bytes.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 17
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.3 DICTIONARY STUBS
Dictionary Stubs immediately follow their respective headers and their
format is determined by the category character in the Dictionary
Header. The function of the stub is to organize the information
appropriate to the symbol and provide a means of accessing additional
information such as type descriptors, constant values, parameter lists
and nested scopes. The format of each Stub is presented in the
following sub-sections.
4.4.3.1 LABEL DECLARATIVES ("O")
This Stub consists of a WORD whose function is (as yet) unknown.
4.4.3.2 UN-TYPED CONSTANTS ("P")
This Stub consists of (2) two fields:
+00: An LG which points to a Type Descriptor (usually in
SYSTEM.TPU). This establishes the minimum storage
requirement for the constant. The rules vary with the
type, but the size of the constant data field (which
follows) is defined using the Type Descriptor(s).
+04: The value of the constant. For ordinal types, this value
is stored as a LONGINT (size=4 bytes). For Floating-Point
types, the size is implicit in the type itself. For
String types, the size is determined from the length of
the string which is stored in the initial byte of the
constant.
4.4.3.3 NAMED TYPES ("Q")
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 18
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
This Stub contains information required to allocate and describe these
types of entities. The format and content is as follows:
+00: A one-byte flag that precisely identifies the class of the
item being described. The known values and their apparent
meanings follow:
$00 -> Global Variables (Allocated in DS);
$01 -> Typed Constants (Allocated in DS);
$02 -> Procedure LOCAL Variables on STACK;
$03 -> Variables at Absolute Addresses;
$06 -> ADDRESS Arguments allocated on STACK; (This is now
used only for SELF in Method calls;)
$08 -> Fields sub-allocated in RECORDS and OBJECTS, plus
METHODS declared for OBJECTS.
$10 -> Variable Equivalenced to another via the
Absolute Clause;
$22 -> Arguments whose VALUEs are passed on the stack;
$26 -> Arguments whose ADDRESSes are passed on the stack.
+01 Two words whose content vary with the codes above. Their
content is explained following the last item in the stub.
+05: An LG that locates the proper Type Descriptor for this
symbol.
When the code byte at +00 is $02,$06,$22 or $26 (arguments), the two
words at +01 are used as follows:
+01 Word -- Offset relative to either DS or BP.
+03 Word -- LL to Dict Header of Parent Scope, or zero.
If the code byte is $00 or $01 (VAR's or typed CONSTs), then we have:
+01 Word -- Offset relative to allocation area origin;
+03 Word -- Offset to entry in VAR/CONST Map for item
allocation;
When the code byte is $03 (Absolute Address Variable), then we have:
+01 DWord -- FAR Pointer to Absolute Memory Address.
When the code byte is $08 (Record/Object Fields/Methods), then we
have:
+01 Word -- Allocation Offset within Record/Object;
+03 Word -- LL to next Field/Method.
When the code byte is $10 (Absolute Equivalences), then we have:
+01 DWord -- LG to STUB of variable/parameter declaration that
actually establishes the allocation;
----------------------------------------------------------------------
Rev: April 16, 1991 Page 19
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.3.5 SUBPROGRAMS & METHODS ("S")
Subprograms (PROC's), especially since Object Methods are supported,
have a rather involved stub. Its format is as follows:
+00: A byte that contains bit-switches that seem to describe
the Call Model and imply the size of this stub. These
switches determine what kind of code (if any) is generated
when the PROC is referenced. The observed values are as
follows:
xxxxx001 -> PROC uses FAR Call Model;
xxxx0010 -> PROC uses INLINE Model (no Call);
xxxx0100 -> PROC uses INTERRUPT Model (no Call);
xxxx100x -> PROC has EXTERNAL attribute;
xxx1xxxx -> PROC uses METHOD Call Model;
x011xxxx -> PROC is a CONSTRUCTOR Method;
x101xxxx -> PROC is a DESTRUCTOR Method;
1xxxxxxx -> PROC has ASSEMBLER directive.
+01 A byte whose function is not yet known. (TP Windows?)
+02: A Word whose interpretation depends on whether or not we
have an INLINE Declarative Subprogram. If this is an
INLINE Declarative Subprogram, then this word contains the
byte-count of the INLINE code text at the end of this
stub. Otherwise, this word is the offset within the PROC
Map that locates the object code for this Subprogram.
+04: A Word that contains an LL which locates the containing
scope in the dictionary, or zero if none.
+06: A Word that contains an LL which locates the local Hash
Table for this scope. A local hash table provides access
to all formal parameters of the Subprogram as well as all
Symbols whose declarations are local to the scope of this
Subprogram.
+08: A Word that is zero unless the symbol is a Virtual Method.
In this case, then the content is the offset within the
VMT for the owning object that defines where the FAR
POINTER to this Virtual Method is stored.
+0A: A complete Type-Descriptor for this Subprogram. The
length is variable and depends upon the number of Formal
Parameters declared in the header. (See 4.4.4.3.5).
+??: If this Symbol represents an INLINE Declarative
Subprogram, then the object-code text begins here. The
byte-count of the text occurs at offset 0002h in this
stub.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 20
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.3.6 TURBO STD PROCEDURES ("T")
This Stub consists of two bytes, the first of which is unique for each
procedure and increments by 4. I have found nothing in the SYSTEM
unit (which is where this entry appears) that this seems directly
related to. The second byte is always zero.
4.4.3.7 TURBO STD FUNCTIONS ("U")
This Stub consists of two bytes, the first of which is unique for each
function and increments by 4. I have found nothing in the SYSTEM unit
(which is where this entry appears) that this seems directly related
to. I wouldn't be surprised if this byte were an index into a TURBO
compiler table that points to specialized parse tables/action routines
for handling these functions and their non-standard parameter lists.
The second byte seems to be a flag having the values $00, $40 and $C0.
I strongly suspect that the flag $C0 marks exactly those functions
which may be evaluated at compile-time. The meaning behind the other
values is not known to me.
4.4.3.8 TURBO STD "NEW" ROUTINE ("V")
This Stub consists of a WORD whose function is (as yet) unknown. This
is the only Standard Turbo routine that can behave as a procedure as
well as a function (returning a pointer value).
4.4.3.9 TURBO STD PORT ARRAYS ("W")
This Stub consists of a byte whose value is 0 for byte arrays, and 1
for word arrays.
4.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 21
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.3.11 UNITS ("Y")
Unit Stubs have the following content:
+00: A Word whose apparently reserved for use by the Compiler
or Linker.
+02: A Word that seems to contain some kind of "signature" used
to detect inconsistent Unit Versions. Borland calls this
a "unit version number, which is basically a checksum of
the interface part." I have seen a thread in CIS which
says that it is a CRC value. Food for thought?
+04: A Word that contains an LL which locates the Successor
Unit in the "Uses" list. In fact, the "Uses" lists of
both the INTERFACE and IMPLEMENTATION sections of the Unit
are merged by this Word into a single list. A value of
zero is used to indicate no successor.
+06: A Word that contains an LL which locates the Predecessor
Unit in the "Uses" list. For the SYSTEM unit entry, this
value is always zero to indicate no predecessor. For the
Unit being compiled, this LL locates the final Unit in the
combined "Uses" list.
In effect, the two LL's at offsets 0004 and 0006 organize the units
into both forward and backward linked chains. The entry for the unit
being compiled is effectively the head of both the forward and the
backward chains. The final unit in the merged "Uses" list is the tail
of the forward chain, and the SYSTEM unit is the tail of the backward
chain.
4.4.4 TYPE DESCRIPTORS
Type Descriptors store much of the semantic information that applies
to the symbols declared in the unit. Implementation details can be
managed using high-level abstractions and these abstractions can be
shared.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 22
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.1 SCOPE
Type Descriptor sharing can occur across the boundaries which are
implicit in unit modules. Thus, a type defined in one unit may be
"imported" by some other module. Also, the pre-defined Pascal Types
(plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
and there needs to be a means of "importing" such Type Descriptors
during compilation. This is precisely the objective of the LG locator
which was described in section 2.2 (above). Type Descriptors are
NEVER copied between units. The binding always occurs by reference at
compile time and this helps support the technique of modifying a unit
and compiling it to a .TPU file, then re-compiling all units/programs
that "USE" it.
Type Descriptors have many roles so their format varies. We have
divided these structures into two parts: The PREFIX Part (which is
always present and) whose format is fairly constant and the SUFFIX
Part whose content and format depends on the attributes that are part
of the type definition.
4.4.4.2 PREFIX PART
The Prefix Part of every Type Descriptor consists of six (6) bytes.
The usage is consistent for all types observed by this author and the
format is as follows:
+00: A Byte that identifies the format of the Suffix part.
This is essentially based on several high-level categories
which the Suffix Parts support directly. The observed set
of values is as follows:
00h -> an un-typed entity;
01h -> an ARRAY type;
02h -> a RECORD type;
03h -> an OBJECT type;
04h -> a FILE type (other than TEXT);
05h -> a TEXT File type;
06h -> a SUBPROGRAM type;
07h -> a SET type;
08h -> a POINTER type;
09h -> a STRING type;
0Ah -> an 8087 Floating-Point type;
0Bh -> a REAL type;
0Ch -> a Fixed-Point ordinal type;
0Dh -> a BOOLEAN type;
0Eh -> a CHAR type;
0Fh -> an Enumerated ordinal type.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 23
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
+01: A Byte used as a modifier. Since the above scheme is too
general for machine-dependent details such as storage
width and sign control, this modifier byte supplies
additional data. The author has identified several cases
in which this information is vital but has not spent very
much time on the subject. The chief areas of importance
seem to be in the 8087 Floating-Point types, and the
Fixed-Point ordinal types. The semantics seem to be as
follows:
0A 00 -> The type "SINGLE"
0A 02 -> The type "EXTENDED"
0A 04 -> The type "DOUBLE"
0A 06 -> The type "COMP"
0C 00 -> an un-named BYTE integer
0C 01 -> The type "SHORTINT"
0C 02 -> The type "BYTE"
0C 04 -> an un-named WORD integer
0C 05 -> The type "INTEGER"
0C 06 -> The type "WORD"
0C 0C -> an un-named double-word integer
0C 0D -> The type "LONGINT"
One important feature of the above semantics is the fact
that an un-typed CONST declaration refers to the above two
bytes to determine the storage space needed in the
dictionary for the data value of the constant. This can
be a little involved however as the constant may contain
its own length descriptor (as in a string) in which case
it may be sufficient to identify the high-level type
category without any modifier byte.
+02: A Word that contains the number of bytes of storage that
are required to contain an object/entity of this type.
For types that represent variable-length objects/entities
such as strings, this word may define the value returned
by the SIZEOF function as applied to the type.
+04 A Word that is zero unless the descriptor is for an Object
Method. In this case, the content is an LL to the
Dictionary Header of the SUCCEEDING Method for the Object,
in order of declaration, or zero if none.
4.4.4.3 SUFFIX PARTS
Suffix Parts further refine the implementation details of the type and
also provide subrange constraints where appropriate. In some cases
the Suffix part is empty since all semantic data for the type is
contained in the Prefix part.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 24
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.3.1 UN-TYPED
This Suffix Part is empty. Nothing is known about an un-typed entity.
4.4.4.3.2 STRUCTURED TYPES
The structured types represent aggregates of lower-level types. We
include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
types in this category.
4.4.4.3.2.1 ARRAY TYPES
The Suffix Part of the ARRAY type is so constructed as to be able to
support recursive or nested definition of arrays. The suffix format
is as follows:
+00: An LG that locates the Type Descriptor for the "base-type"
of the array. This is the type of the entity being
arrayed (which may itself be an array).
+04: An LG that locates the Type Descriptor for the array
bounds which is a constrained ordinal type or subrange.
4.4.4.3.2.2 RECORD TYPES
RECORD types have nested scopes. The Suffix part provides a base
structure by which to locate the fields local to the scope of the
Record type itself. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields in the nested scope.
+02: A Word containing an LL which locates the Dictionary
Header of the initial field in the nested scope. This
supports a "left-to-right" traversal of the fields in a
record.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 25
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.3.2.3 OBJECT TYPES
OBJECT types also have nested scopes. The Suffix part provides a base
structure by which to locate the fields and METHODS local to the scope
of the OBJECT type itself. In addition, inheritance and VMT
particulars are stored. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields and METHODS local to
the nested scope.
+02: A Word containing an LL which locates the Dictionary
Header of the initial field or METHOD in the nested scope.
This supports a "left-to-right" traversal of the fields
and METHODS in an OBJECT.
+04: An LG which locates the Type Descriptor of the Parent
Object. This field is zero if there is no such Parent.
+08: A Word which contains the size in bytes of the VMT for
this Object. This field is zero if the object employs no
Virtual Methods, Constructors or Destructors.
+0A: A Word which contains the offset within the CONST DSeg Map
that locates the VMT skeleton or template segment. This
field equals FFFFh if the object employs no Virtual
Methods, Constructors or Destructors.
+0C: A Word which contains the offset within an Object instance
where the NEAR POINTER to the VMT for the object is stored
(within the DATA SEGMENT). This field equals FFFFh if the
object employs no Virtual Methods, Constructors or
Destructors.
+0E: A Word which contains an LL which locates the Dictionary
Header for the name of the OBJECT itself.
+10: A Word (not yet understood) containing $FFFF.
+12: Three Words (not yet understood) containing zeroes.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 26
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.3.2.4 FILE (NON-TEXT) TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file. Note that the Type Descriptor may be that of
an un-typed entity (for un-typed files).
4.4.4.3.2.5 TEXT FILE TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file -- in this case SYSTEM.CHAR.
4.4.4.3.2.6 SET TYPES
This Suffix consists of an LG that locates the base-type of the set
itself. Pascal limits such entities to simple ordinals whose
cardinality is limited to 256.
4.4.4.3.2.7 POINTER TYPES
This Suffix consists of an LG that locates the base-type of the entity
pointed at.
4.4.4.3.2.8 STRING TYPES
This is a special case of an ARRAY type. The format is as follows:
+00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
type of all Turbo Pascal Strings.
+04: An LG to the Type Descriptor for the array bounds
constraints for the string. When the unconstrained STRING
type is used, this points to SYSTEM.BYTE which is defined
as a subrange 0..255.
4.4.4.3.3 FLOATING-POINT TYPES
The Suffix part for all Floating-Point types is EMPTY. All data
needed to specify these approximate number types is contained in the
Prefix part. The Types included in this class are SINGLE, DOUBLE,
EXTENDED, COMP and REAL.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 27
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.3.4 ORDINAL TYPES
The Ordinal Types consist of the various "integer" types plus the
BOOLEAN, CHAR and Enumerated types.
4.4.4.3.4.1 "INTEGERS"
These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
Suffix parts are identical in format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor of the largest
upward compatible type. This is the Type Descriptor that
is used to control the width of an un-typed constant in
the dictionary stub. For the "integer" types, this is an
LG to SYSTEM.LONGINT.
4.4.4.3.4.2 BOOLEANS
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
There is no "upward compatible" type.
4.4.4.3.4.3 CHARS
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
is no "upward compatible" type.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 28
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
4.4.4.3.4.4 ENUMERATIONS
This type Suffix is unusual and has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Prefix of the current Type
Descriptor. There is no upward compatible type.
What follows is a full-fledged SET Type Descriptor whose base type is
the Type Descriptor of the Enumerated Type itself. The author has not
yet discovered the reason for this.
At least one case has been observed where a set type descriptor is
followed by a word containing zero but I know of no explanation.
Could this be a (shudder) BUG in Turbo?
4.4.4.3.5 SUBPROGRAM TYPES
The length of this Suffix is variable. The format is as follows:
+00: An LG that locates the Type Descriptor of the FUNCTION
result returned by the Subprogram. This field is zero if
the Subprogram is a PROCEDURE.
+04: A Word that contains the number of Formal Parameters in
the Function/Procedure header. If non-zero, then this
word is followed by the parameter list itself as a simple
array of parameter descriptors.
The format of a parameter descriptor is as follows:
0000: An LG that locates the Type Descriptor of the
corresponding parameter;
0004: A Byte that identifies the parameter passing
mechanism used for this entry as follows:
02h -> VALUE of parameter is passed on STACK,
06h -> ADDRESS of parameter is passed on STACK.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 29
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
5. MAPS AND LISTS
The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
these structures provide access to the Code and Data Segments produced
by the compiler or included via the {$L name.OBJ} directive. The
format and purpose (as understood by this author) of each of these
tables is explained in the following sections.
5.1 PROC MAP
The PROC Map provides a means of associating the various Function and
Procedure declarations with the Code Segments. There is some evidence
that the Compiler produces CODE (and DATA) Segments for EACH of the
Subprograms defined in the Unit as well as for the un-named Unit
Initialization code block. There is also evidence that EXTERNAL PROCs
must be assembled separately in order to exploit fully the Turbo
"Smart Linker" since Turbo Pascal places some significant restrictions
on EXTERNAL routines in the area of Segment Names and Types.
Specifically, only code segments named "CODE" and data segments named
"DATA" or "CONST" will be used by the "Smart Linker" as sources of
code and data for inclusion in a Turbo Pascal .EXE file. (Turbo 6.0
relaxed Name constraints but only one code segment per .OBJ remains a
limitation).
The first entry in the PROC Map is reserved for Unit Initialization
block. If there is no Unit Initialization block, this entry will be
filled with $FF. In addition, each and every PROC in the Unit has an
entry in this table.
If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
in that routine must be declared in the Unit Source Code with the
EXTERNAL attribute.
The size of the PROC Map Table (in Bytes) is implied in the Unit
Header by the LL's that occur at offsets +0C and +0E.
The Format of a single PROC Map Entry is as follows:
+00: A Word presumably reserved as a work area; always zero.
+02: A Word presumably reserved as a work area; always zero.
+04: A Word that contains an offset within the CSeg Map. This
is used to locate the code segment containing the PROC.
+06: A Word that contains an offset within the CODE Segment
that defines the PROC entry point relative to the load
point of the referenced CODE Segment.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 30
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
5.2 CSEG MAP
The CSeg Map provides a convenient descriptor table for each CODE
Segment present in the Unit and serves to relate these segments with
the Segment Relocation Data and the Segment Trace Table. It seems
reasonable to infer that the "Smart Linker" is able to include/exclude
code/data at the SEGMENT level only.
The CSeg Map is an array of fixed-length records whose format is as
follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Fix-Up Data Table
for this Code Segment (in bytes).
+06: A Word that contains the offset of the Trace Table Entry
for this Segment (if it was compiled with DEBUG Support).
If there is no Trace Table for this segment, then this
Word contains FFFFh.
5.3 TYPED CONST DSEG MAP
The CONST DSeg Map provides a convenient descriptor table for each
DATA Segment which was spawned by the presence of Typed Constants or
VMT's in the Pascal Code. It serves to relate these segments with the
Segment Fix-Up (relocation) Data and with the Code Segments that refer
to these DATA elements. One entry is present for each CONST
declaration part containing typed constants and for each CONST segment
linked from an ".OBJ" file. The CONST DSeg Map is an array of fixed-
length records whose format is as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Fix-Up Data Table
for this DATA Segment (in bytes).
+06: A Word that contains an LL which locates the OBJECT that
owns this VMT template or zero if the segment is not a VMT
template.
One can determine the defining block for a Typed Constant declaration
and our program attempts to do just that. A by-product of the
dictionary mapping algorithm allows the declaring block to be found
and its qualified name printed. This information is also used to
explain fix-up data as to its source. Results will be incomplete
unless a really comprehensive dictionary is present in the unit.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 31
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
5.4 GLOBAL VAR DSEG MAP
The VAR DSeg Map provides a convenient descriptor table for each DATA
Segment present in the Unit.
One entry exists for each CODE segment which refers to GLOBAL VAR's
allocated in the DATA Segment. These references may be seen in the
Fix-Up Data Table. Each EXTERNAL CSeg having a segment named DATA
also spawns an entry in this table. Only the Code Segments that meet
these criteria cause entries to be generated in the VAR Dseg Map.
The VAR DSeg Map is an array of fixed-length records whose format is
as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes). This
may be zero, especially if the EXTERNAL routine contains a
DATA segment whose sole purpose is to declare one or more
EXTRN symbols that are defined in some DATA segment
external to the Assembly.
+04: A Word apparently reserved for use by TURBO.
+06: A Word apparently reserved for use by TURBO.
One can determine the defining block for a Global VARiable declaration
and our program attempts to do just that. A by-product of the
dictionary mapping algorithm allows the declaring block to be found
and its qualified name printed. This information is also used to
explain fix-up data as to its source. Results will be incomplete
unless a really comprehensive dictionary is present in the unit. Such
DSegs can be referenced by many CSegs and we only locate the first
one. This is okay for Pascal code but it's ambiguous for assembler
since the names may be PUBLIC and referenced by more than one module.
5.5 DONOR UNIT LIST
This list contains an entry for each Unit (taken from the "USES" list)
which MAY contribute either CODE or DATA to the executable file. Not
all units do make such a contribution as some exist merely to define a
collection of Types, etc. A Unit gets into this list if there exists
a single Fix-Up Data Entry that references CODE or DATA in that Unit.
The list is comprised of elements whose SIZE is variable and whose
format is as follows:
+00: A WORD apparently reserved for use by TURBO.
+02: A variable-length String containing the unit name.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 32
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
5.6 SOURCE FILE LIST
This list contains an entry for each "source" file used to compile the
Unit. This includes the Primary Pascal file, files containing Pascal
code included by means of the {$I filename.xxx} compiler directive,
and .OBJ files included by the {$L filename.OBJ} compiler directive.
The order of entries in this list is critical since it maps the CODE
segments stored in the unit. The order of the entries is as follows:
1) The Primary Pascal file;
2) All Included Pascal files;
3) All Included .OBJ files.
Mapping of CSegs to files is done as follows:
a) Each .OBJ file contributes a SINGLE Code Segment (if any).
Note that this author has not observed an .OBJ module that
contains only a DATA Segment (but that seems a distinct
possibility).
b) The Primary Pascal file (augmented by all included Pascal
Files) contributes zero or more CODE Segments.
Therefore, there are at least as many CSeg entries as .OBJ files. If
more, then the excess entries (those at the front of the list) belong
to the Pascal files that make up the Pascal source for the unit.
The format of an entry in this list is as follows:
+00: A flag byte that indicates the type of file represented;
04h -> the Primary Pascal Source File,
03h -> an Included Pascal Source File,
05h -> an .OBJ file that contains a CODE segment.
+01: A Word apparently reserved for use by the Compiler/Linker.
+03: A Word that is zero for .OBJ files and which contains the
file directory time-stamp for Pascal Files.
+05: A Word that is zero for .OBJ files and which contains the
file directory date-stamp for Pascal Files.
+07: A variable-sized string containing the filename and
extension of the file used during compilation.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 33
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
5.7 DEBUG TRACE TABLE
If Debug support was selected at compile time, then all Pascal code
which supports Debugging produces an entry in this table. The table
entries themselves are variable in size and have the following format:
+00: A Word which contains an LL that locates the Directory
Header of the Symbol (a PROC name) this entry represents.
+02: A Word which contains the offset (within the Source File
List) of the entry that names the file that generated the
CSeg being traced. This allows the file included by means
of the {$I filename} directive to be identified for DEBUG
purposes, as well as code produced from the Primary File.
+04: A Word containing the number of bytes of data that precede
the BEGIN statement code in the segment. For Pascal PROCS
these bytes consist of literal constants, un-typed
constants, and other data such as range-checking limits,
etc.
+06: A Word containing the Line Number of the BEGIN statement
for the PROC.
+08: A Word containing the number of lines of Source Code to
Trace in this Segment.
+0A: An array of bytes whose size is at least the number of
source code lines in the PROC. Each byte contains the
number of bytes of object code in the corresponding source
line. This appears to be an array of SHORTINT since if a
"line" contains more than 127 bytes, then a single byte of
$80 precedes the actual byte count as a sort of "escape"
and the next byte records the up to 255 bytes for the
line. This situation has not yet been fully explored. We
do not yet know what happens in the event a line is
credited with spawning more than 255 bytes of code.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 34
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
6. CODE, DATA, FIX-UP INFO
This area begins at the start of the next free PARAGRAPH. This means
that its offset from the beginning of the Unit ALWAYS ends in the
digit zero.
This area contains the CODE segments, CONST DATA segments, and the
Relocation (Fix-Up) Data required for linking.
6.1 OBJECT CSEGS
Each CODE segment included in the unit appears here as specified by
the CSeg Map Table. Depending on usage, these segments may appear in
the executable file. There are no filler bytes between segments.
6.2 CONST DSEGS
This section begins at the start of the first free PARAGRAPH following
the end of the Object CSegs. This means that its offset from the
beginning of the Unit ALWAYS ends in the digit zero.
A DATA segment fragment appears here for each CSeg that declares a
typed constant, and for each OBJECT which employs Virtual Methods,
Constructors or Destructors. There are no filler bytes between
segments.
If local symbols were generated, there is always enough information to
allow documenting the scope of the declaration as well as interpreting
the data in the display since the needed type declarations would also
be available. Our program merely identifies the defining block.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 35
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
6.3 FIX-UP DATA TABLE
This table begins at the start of the first free PARAGRAPH following
the end of the CONST DSegs. This means that its offset from the
beginning of the Unit ALWAYS ends in the digit zero. There are two
sections in this table: one for code, and one for data. Both
sections are aligned on paragraph boundaries. This may result in a
"slack" entry between the code and data sub-sections, but this entry
is included in the byte tally for the section stored in the Unit
Header Table at UHZFA (offset +22).
The table begins with entries for the CSeg Map and ends with entries
for the CONST DSeg Map. The appropriate Map entry specifies the
number of bytes of Relocation Data for the corresponding segment.
This number may be zero in which case there is no Relocation Data for
the given segment.
The Table consists of an array of eight (8) byte entries whose format
is as follows:
+00: A Byte containing the offset within the Donor Unit List of
the Unit name that this entry refers to. This can be the
compiled Unit or some previously compiled external unit.
+01: A Byte of BIT switches that identify the type of reference
and the size of the needed fix-up (WORD or DWORD). A lot
of guess-work led to the following interpretation:
7654 (bits 3-0 don't seem to be used)
00-- Locate item via a PROC Map,
01-- Locate item via a CSeg Map,
10-- Locate item via a Global VAR DSeg Map,
11-- Locate item via a Const DSeg Map,
--00 WORD offset has NO effective address adjustment,
--01 WORD offset HAS an effective address adjustment,
--10 WORD SEGMENT-Only fix-up (address of some PUBLIC
segment),
--11 DWORD (FAR) pointer; possible effective address
adjustment.
+02: A Word containing the offset within the Map table
referenced according to the above code scheme.
+04: A Word containing an offset within the target segment
which will be added to the effective address. For
example, a reference to the VAR DSeg Map will require a
final offset to locate the item (variable) within the DATA
SEGMENT being referenced here. This may also be needed
for references to LITERAL DATA embedded in a CODE SEGMENT.
+06: A Word containing the offset within the CODE or DATA
segment owning this entry that contains the area to be
patched with the value of the final effective address.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 36
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
7. SUPPLIED PROGRAM
In order that the above information be made constructively useful, the
author has designed a program that automates the process of discovery.
It is not a "handsome" program and it is not a work of art. It does
give useful results provided your PC has enough available memory.
It should be obvious that the program was not designed "top-down".
Rather, it just evolved as each new discovery was made. Later on, it
seemed reasonable to try to document some of the relations between the
various lists and tables and the program tries to make some of these
relations clear, albeit with varying degrees of success.
7.1 TPU6
This is the main program. It will ask for the name of the unit to be
documented. Reply with the unit name only. The program will append
the ".TPU" extension and will search for the proper file. It will
also search TURBO.TPL if necessary.
The program will then ask if Dis-Assembly is desired and will require
a "y" or "n" answer. If "y", it also asks about the CPU.
The current directory will be searched first, followed by all
directories in the current PATH. If the .TPU file is not found, the
program will search for it in the "TURBO.TPL" (Turbo Pascal Library)
file. Units in the "USES" list(s) will also be loaded to enable
resolution of LG items.
If the desired unit is found, the program will write a report to the
current directory named "unitname.lst" which contains its analysis.
The format of the report is such that it may be copied to a printer if
that printer supports TTY control codes with form-feeds. Be judicious
in doing this however since there can be a lot of information. The
Turbo SYSTEM.TPU unit file produces almost ninety (90) pages without
the disassembly option. When disassembly is requested for the SYSTEM
unit, the size of the output file exceeds 700K bytes.
7.1.1 UNIT TPU6AMS
This Unit contains all Type Definitions, Structures, and primitive
Functions and Procedures required by the program. All structures
documented in this report are also documented in TPU6AMS by means of
the TYPE mechanism. Some of the structures are difficult if not
impossible to handle using ISO Pascal but Turbo Pascal provides the
means for getting the job done.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 37
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
7.1.2 UNIT TPU6EQU
This Unit is new and contains constants and types of general utility
that are not strictly unit related. It also constains the pointer
manipulation routines that are sensitive to the particular version of
Turbo Pascal Version 6.0. It also contains a Heap Error Function that
keeps track of the high-water mark of Heap Utilization of any program
that uses it. This function gets installed automatically.
7.1.3 UNIT TPU6UTL
This Unit is new. It contains the higher-level analysis algorithms
formerly located in the main program and in TPU6AMS. The algorithms
have been re-cast with object-orientation in mind and have potential
for re-use in other contexts. The unit computes a cover for the
dictionary and deduces relationships between dictionary, code, data
and the CSeg, PROC, CONST and VAR Maps discussed in section 5. This
information is retrieved by the main program to drive the printing
process.
This Unit also loads all units specified in the USES list of the prime
unit to allow the names of externally defined types to be recovered on
the report. Array bounds are also retrieved in this way. The code
will search for needed units in TURBO.TPL without intervention. Close
attention is paid to Heap Management and minimal utilization of Heap
storage. The dictionary areas of the Units located in the USES list
get loaded into the Heap at no extra charge. Nothing but the
dictionary area is of any use at this point. The name and fully-
qualified file name of each unit successfully loaded are printed at
the top of the listing. Unit version numbers must agree or the unit
will not be loaded. Dictionary covers are computed for each loaded
unit to aid in rapid LG-resolution.
7.1.4 UNIT TPU6RPT
This is a Unit that contains the text-file output primitives required
by the main program. It's not very pretty but it does work.
7.1.5 UNIT TPU6UNA
This unit is a rudimentary disassembler. The output will not assemble
and may look strange to a "real" assembler programmer since I am not
well-qualified in this area. However, the basis for support of 80286,
80386 etc. processors is present as well as coprocessor support. Of
perhaps the greatest interest is that it does appear to decode the
emulated coprocessor instructions that are implemented via INT 34-3D.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 38
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
Be warned however. The output is not guaranteed since this was coded
by myself and I am perhaps the rankest amateur that ever approached
this quite awful assembler language. For convenience, the operand
coding mimics TASM "Ideal" mode.
As is usual with programs of this type, error-recovery is minimal and
no context checking is performed. If the operation code is found to
be valid, then a valid instruction is assumed -- even if invalid
operands are present.
The only positives that apply to this program are that it doesn't slow
the cpu down (although a lot more output is produced), and it does let
one "tune" code for compactness by letting one view the results of the
coding directly. Also, incomplete instructions are handled as data
rather than overrunning into the next proc.
7.2 MODIFICATIONS
It was intended from the beginning that this program should be able to
be enhanced to permit external units to be referenced during the
analysis of any given unit, even if they were library components.
Since the original release of this document, the program has been so-
enhanced.
This program was NOT intended as a pilot for some future product. It
WAS intended as a rather "ersatz" tool for myself.
7.3 NOTES ON PROGRAM LOGIC
The following sections discuss a few of the methods employed by the
supplied program.
7.3.1 FORMATTING THE DICTIONARY
Printing the unit dictionary area in a way that exposes its underlying
semantics is no small task. The unit dictionary area itself is a
rather amorphous-looking mass of data composed of hash tables,
dictionary headers and stubs, type descriptors, etc. In order to
present all this information in a meaningful way, we have to reveal
its structure and this cannot be done by means of a sequential
"browse" technique. Rather, we have to visit all nodes in the
dictionary area so that each may be formatted in a way that exposes
their function and meaning. This is made necessary by the fact that
items are added to the dictionary as encountered and no convenient
ordering of entry types exists. What we have here is the problem of
finding a minimal "cover" for the dictionary area that properly
exposes the content and structure of the dictionary area.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 39
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
To do this, we construct (in the heap) a stack and a queue, both of
which are initially empty. The entries we put in the stack identify
the class of entry (Hash Table, Dictionary Header, Type Descriptor or
In-Line Code group), the location of the structure, and the location
of its immediate "owner" or "parent" dictionary entry (which allows
some limited information about scope to be printed).
To the empty stack, we add an entry for the unit name dictionary
entry, the INTERFACE hash table, and the Debug hash table. All these
are located via direct pointers (LL's) in the Unit Header Table. We
then pop one entry off the stack and begin our analysis.
a) If the entry we popped off the stack is not present in the
queue, we add it and call a routine that can interpret the entry
(aka, "cover") for a Dictionary Header, Hash Table, or Type
Descriptor. (This may lead to additional entries being added to
the stack such as nested-scope hash tables, Dictionary Headers,
Type Descriptors or In-Line Code group entries.)
b) While the stack is not empty, we pop another entry and repeat
step "a" (above) until no more entries are available.
The result is a queue containing one entry for each structure in the
unit dictionary area that is identifiable via traversal. (In
practice, the method we use is similar to a "breadth-first" traversal
of an n-way tree that is implemented in non-recursive fashion.) Each
entry in the queue contains the information described above and the
queue itself thus forms a set of descriptors that drive the process of
formatting the dictionary area for display. The process may be
likened to "painting by the numbers" or to finding a way to lay tile
on a flat surface using tiles of four different irregular shapes until
the floor is exactly covered.
There is one significant limitation that needs to be pointed out. It
is not always possible to determine the "parent" or "owner" of a node
with certainty. The following discussion illustrates the problem of
finding the "real" parent of a Type Descriptor.
Almost every "type" in Turbo Pascal is actually derived from the basic
types that are defined in the SYSTEM.TPU unit -- e.g. "INTEGER",
"BYTE", etc. In addition, several of the Type Descriptors in the
SYSTEM unit are referenced by more than one Dictionary Entry. Thus,
we find that a "many-to-one" relationship may exist between Dictionary
Entries and Type Descriptors. How does one find out which is the
entry that actually gave rise to the Type Descriptor?
----------------------------------------------------------------------
Rev: April 16, 1991 Page 40
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
The Dictionary Area of a unit has some special properties, one of
which is the fact that the Dictionary Entries for named Types are
often located quite near their primary type descriptors. The
Dictionary Area seems to be treated as an upward growing heap with the
various structures being added by Turbo as encountered. This makes it
likely that the Type "Q" header which gives rise to a type descriptor
is quite likely to occur earlier in the Dictionary Area than any other
header which refers to the same descriptor. We take advantage of this
property to allocate "ownership" but it may not be "fool-proof". Some
type descriptors are spawned by other type descriptors, especially for
structured types. We don't attempt to allocate "ownership" to these
"lower-level" descriptors but we do try to keep track of scope
information.
A useful by-product of the above process is the ability to discover
many of the associations between Global Variables, Typed CONST's,
VMT's and the blocks in which they are declared or defined.
7.3.2 THE DISASSEMBLER
To start with, I apologize up front for mistakes which are bound to be
present in this routine. I am not really a MASM or TASM programmer
and I will not pretend otherwise. This being the case, the formatting
I have chosen for the operands may be erroneous or misleading and
might (if submitted to one of the "real" assemblers) produce object
code quite different from what is expected. I hope not, but I have to
admit it's possible.
My intention in adding this unit was to support hand-tuning of object
code. With practice and some effort, one can observe the effect on
the object module caused by specific Pascal coding. Thus, where
compactness or speed is an issue of paramount importance, TPU6UNA can
be of help. In some cases, a simple re-arrangement of the local
variable declarations in a procedure can have a significant effect on
the size of the code if it means the difference between 1 and 2-byte
displacements for each instruction that references a specific local
variable. Potential applications along these lines seem almost
unlimited.
I adopted an operand format not unlike that of TASM "Ideal" mode since
it was more convenient to do so and looked more readable to me. I
relied on several reference books for guidance in decoding the entire
mess and I found that there were several flaws (read ERRORS) in some
of them which made the job that much more difficult. I then
compounded my problems by attempting to handle 80386 specific code
even though Turbo Pascal does not yet generate code specific to these
processors. I simply felt that the effort involved in writing any
sort of Dis-Assembly program for Turbo Pascal units was an effort best
experienced not more than once. With all this self-flagellation out
of my system once and for all, I will try to show the basic strategy
of the program and to explain the limitations and some of the
discoveries I made.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 41
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
The routine is intended to be idiotically simple - i.e., no smarter
than the DEBUG command in principle. The basic idea is: pass some
text to the routine and get back ONE line derived from some prefix of
that text. Repeat as necessary until all text is gone. Thus, there
is no attempt to check the context of the text being processed. Also,
some configurations of the "modR/M" byte may invalid for selected
instructions. I don't try to screen these out since the intent was to
look at the presumably correct code produced by TURBO Pascal -- not
devious assembly language. Also, this program regards WAIT operations
as "stand-alone" -- i.e., it doesn't check to see if a coprocessor
operation follows for which the WAIT might be regarded as a prefix.
One area of real difficulty was figuring out the Floating-Point
emulations used by Turbo Pascal that are implemented by means of
interrupts $34 through $3D. I don't know if I got it right, but the
results seem reasonable and consistent. In the listing, the Interrupt
is produced on one line, followed by its parameters on the next line.
The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the
coprocessor op-code I felt was being emulated. Interrupt $3C was a
real puzzler but after seeing a lot of code in context, I think that
the segment override is communicated to the emulator by means of the
first byte after the $3C.
Normally, in a non-emulator environment, all coprocessor operations
(ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and
maybe Microsoft) seem to have done here is to change the $D8-$DF so
that bits 7 and 6 of this byte are replaced with the one's complement
of the 2-bit segment register number found in various 8086
instructions. This seems to be how an override for the DS register is
passed to the emulator. I don't KNOW this to be the correct
interpretation, but the code I have examined in context seems to work
under this scheme, so TPU6UNA uses it to interpret the operand
accordingly.
For 80x86 machines, the problem was somewhat simpler. TPU6UNA takes a
quick look at the first byte of the text. Almost any byte is valid as
the initial byte of an instruction, but some instructions require more
than one byte to hold the complete operation code. Thus, step 1
classifies bytes in several ways that lead to efficient recognition of
valid operation codes.
Once the instruction has been identified in this way, it is more or
less easy to link to supplemental information that provides operand
editing guidance, etc.
The tables that embody the recognition scheme were constructed using
PARADOX 3.0 (another fine Borland product) and suitably coded queries
were used to generate the actual Turbo Pascal code for compilation.
For those that are interested, TPU6UNA supports the address-size and
operand-size prefixes of the 80386 as well as 32-bit operands and
addresses but remember that Turbo Pascal doesn't generate these. A
trivial change is provided for which allows segments which default to
32-bit mode to be handled as well.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 42
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
There is a simple mode variable that gets passed to TPU6UNA by its
caller which specifies the most-capable processor whose code is to be
handled. Codes are provided for the 8086 (8088 is the same), 80186
(same as 80286 without protected mode instructions), 80286 (80186 plus
protected mode), and 80386. You now get asked which one to use.
No such specifier is provided for coprocessor support. What is there
is what I think an 80387 supports. I don't think that this is really
a problem if you don't try to use TPU6UNA for anything but Turbo
Pascal code.
Error recovery is predictably simple. The initial text byte is output
as the operand of a DB pseudo-op and provision is made to resume work
at the next byte of text.
I hope this program is found to be useful in spite of the errors it
must surely contain. I have yet to make much sense of the rules for
MASM or TASM operand coding and I found very little of value in many
of the so-called "texts" on the subject. I found myself in the
position of that legendary American in England watching a Cricket
match for the first time ("You mean it has RULES?").
8. UNIT LIBRARIES
I have examined .TPL files in passing and feel that their structure is
trivial. It's so easy to handle them that the program now routinely
examines TURBO.TPL to resolve named types.
8.1 LIBRARY STRUCTURE
A Turbo Pascal Library (.TPL) file is a simple catenation of Turbo
Pascal Unit (.TPU) files. Since the length of a Unit may be
determined from the Unit Header (see section 3.1), it is simple to see
that one may "browse" through a .TPL file looking for an external unit
such as SYSTEM.TPU. The supplied program does just that in its unit
retrieval process so the TPUMOVER utility is no longer required for
processing of units in TURBO.TPL
----------------------------------------------------------------------
Rev: April 16, 1991 Page 43
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
9. APPLICATION NOTES
One of the more obvious applications of this information would seem to
be in the area of a Cross-Reference Generator.
There is a very fine example of such a program in the public domain
that was written by Mr. R. N. Wisan called "PXL". This program has
been around since the days of Turbo Pascal Version 1. The program has
been continually enhanced by the author in the way of features and for
support of the newer Turbo Pascal versions. It does not however solve
the problem of telling one which unit contains the definition of a
given symbol. In fairness to "PXL" however, this is no small problem
since the format of .TPU files keeps changing (Turbo 6.0 Units are
not object-code compatible with Turbo 5.x Units, and so on...) and
Mr. Wisan probably has more than enough other projects to keep himself
occupied.
However, for the user who is willing to work a little (maybe a lot?),
this document would seem to provide the information needed to add such
a function to his own pet cross-reference generator.
Further, with SIGNIFICANTLY more effort, it should be possible to do
much of the job of de-compilation -- provided the DEBUG dictionary is
present. At the very least, most declarations should be recoverable.
It's another thing entirely to try to reconstruct plausable TURBO
Pascal code from the CSegs. This would be a formidable task and lots
of knowledge about TURBO's code generators would have to be acquired.
At present, the only way I know to get this information is to have the
run-time library source codes and then work-work-work at testing code
produced by the compiler for a huge number of test case units. You
have to want to do this really badly in order to invest the time. I
am not that tired of living.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 44
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
10. ACKNOWLEDGEMENTS
This project would have been totally infeasible without the aid of
some very fine tools. As it was, several hundred man hours have been
expended on it and as you can see, there are a few unresolved issues
that have been (graciously) left for others to address. The tools
used by this author consisted of:
1) Turbo Pascal 6.0 Professional by Borland International
2) Microsoft WORD (version 5.0)
3) LIST (version 7.5) by Vernon D. Buerg
4) the DEBUG utility in MS-DOS Version 3.3.
5) PARADOX 3.0 by Borland International
6) QUATTRO PRO by Borland International
7) TURBO ASSEMBLER 1.1 by Borland International
(PARADOX and QUATTRO PRO were used for data collection and analysis in
the course of coding the recognizer tables for the disassembler unit.)
The references listed were of great value in this project. [Intel85]
was a valuable source of information about coprocessor instructions as
well as offering hints about the differences between the 8086/8088 and
the 80286. The [Borland] TASM manuals offered further info on the
80186. [Nelson] provided presentations of well-organized data
directed at the problem of disassembly but the tables were flawed by a
number of errors which crept into my databases and which caused much
of the extra debugging effort. [Intel89] offered valuable insights on
the 80386 addressing schemes as well as the 32-bit data extensions.
Finally, [Brown] provided valuable clues on the Floating-Point
emulators used by Borland (and Microsoft?). As you can see, the
amount of hard information available to me on this project was quite
limited since I am unaware of any other existing body of literature on
this subject.
That's it folks. Does anyone wonder why it took several hundred man
hours to get to this point? It took a lot of hard (and at times
tedious) work coupled with a great many lucky guesses to achieve what
you see here.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 45
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
11. REFERENCES
[Borland], TURBO ASSEMBLER REFERENCE GUIDE, Borland International,
1988.
[Borland], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988.
[Borland] TURBO PASCAL 6.0 PROGRAMMING GUIDE, Borland International,
1990.
[Borland] TURBO PASCAL LIBRARY REFERENCE Version 6.0, Borland
International, 1990.
[Borland] TURBO PASCAL USER'S GUIDE Version 6.0, Borland
International, 1990.
[Brown], INTER191.ARC, Ralf Brown, 1991
[Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX
286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order
number 210498-003).
[Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel
Corporation, 1989, (order number 240331-001).
[Nelson] THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR
THE 80386, Ross P. Nelson, Microsoft Press, 1988.
[Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J.
Scanlon, Brady 1986.
----------------------------------------------------------------------
Rev: April 16, 1991 Page 46
Inside TURBO Pascal 6.0 Units
----------------------------------------------------------------------
INDEX
.OBJ file 12, 13, 30, 31, 33
.TPL file 6, 14, 37, 38, 43
.TPU
file 5, 7, 11, 14, 23, 37, 43, 44
size 14
SYSTEM 6, 16, 17, 18, 23, 37, 40, 43
Assembler 6
Attribute
ABSOLUTE 7
EXTERNAL 20, 30
Call Model
ASSEMBLER 20
FAR 20
INLINE 20
INTERRUPT 20
CONST 6, 11, 12, 13, 19, 24, 26, 31, 35, 36, 38
Constraint 28, 29
CSeg 6, 11, 12, 30, 31, 32, 33, 34, 35, 36, 38
Defining block 31, 32
Directive 12, 13, 14, 20, 30, 33, 34
External 7, 30, 32, 36, 39, 43
Hash 11, 12, 13, 14, 15, 16, 17, 20, 25, 26, 39, 40
Include 33, 34
Interface 6, 11, 12, 13, 14, 15, 16, 17, 22, 40
Locator
LG 7, 10, 18, 19, 21, 23, 25, 26, 27, 28, 29
LL 7, 11, 16, 22, 30, 40
offset 7, 9, 10, 19, 20, 26, 30, 31, 34, 35, 36
Method 20
CONSTRUCTOR 20
DESTRUCTOR 20
Self 19
Operand offset 36
Parameter 18, 19, 20, 21, 29
PROC 6, 11, 12, 20, 30, 34, 36, 38, 39
SEGMENT 36
Signature 5, 22
Stub 7, 17, 18, 19
Type Descriptor 18, 19, 21, 23, 25, 26, 27, 28, 29, 40, 41
VAR 32, 38
VMT 12, 13, 20, 26, 31
----------------------------------------------------------------------
Rev: April 16, 1991 Page 47